Content-aware Load Balancing for Distributed Backup
نویسندگان
چکیده
When backing up a large number of computer systems to many different storage devices, an administrator has to balance the workload to ensure the successful completion of all backups within a particular period of time. When these devices were magnetic tapes, this assignment was trivial: find an idle tape drive, write what fits on a tape, and replace tapes as needed. Backing up data onto deduplicating disk storage adds both complexity and opportunity. Since one cannot swap out a filled disk-based file system the way one switches tapes, each separate backup appliance needs an appropriate workload that fits into both the available storage capacity and the throughput available during the backup window. Repeating a given client’s backups on the same appliance not only reduces capacity requirements but it can improve performance by eliminating duplicates from network traffic. Conversely, any reconfiguration of the mappings of backup clients to appliances suffers the overhead of repopulating the new appliance with a full copy of a client’s data. Reassigning clients to new servers should only be done when the need for load balancing exceeds the overhead of the move. In addition, deduplication offers the opportunity for content-aware load balancing that groups clients together for improved deduplication that can further improve both capacity and performance; we have seen a system with as much as 75% of its data overlapping other systems, though overlap around 10% is more common. We describe an approach for clustering backup clients based on content, assigning them to backup appliances, and adapting future configurations based on changing requirements while minimizing client migration. We define a cost function and compare several algorithms for minimizing this cost. This assignment tool resides in a tier between backup software such as EMC NetWorker and deduplicating storage systems such as EMC Data Domain. ∗Work done during an internship. Tags: backups, configuration management, infrastructure, deduplication
منابع مشابه
Diss . Eth No . 17737 Interference - Aware Routing in Wireless Multihop Networks
Nowadays, wireless multihop networks, providing mesh connectivity emerge as alternative network infrastructure for numerous applications such as shared broadband Internet access, monitoring for emergency, medical and security reasons, distributed backup and multimedia applications. Since these networks are highly decentralized and self-organized, routing becomes a critical factor for their perf...
متن کاملOnline Distribution and Load Balancing Optimization Using the Robin Hood and Johnson Hybrid Algorithm
Proper planning of assembly lines is one of the production managers’ concerns at the tactical level so that it would be possible to use the machine capacity, reduce operating costs and deliver customer orders on time. The lack of an efficient method in balancing assembly line can create threatening problems for manufacturing organizations. The use of assembly line balancing methods cannot balan...
متن کاملLoad Balancing on the Internet
Introduction 1 Workload Characteristics of Internet Services 2 Web Applications 3 Streaming Applications 4 Taxonomy of Load-Balancing Strategies 4 Load Balancing in the Server, the Network, and the Client Sides 4 State-Blind versus State-Aware Load Balancing 5 Load Balancing at Different Network Layers 5 Server-Side Load Balancing 5 DNS-Based Load Balancing 5 Dispatcher-Based Load Balancing 7 S...
متن کاملLoad Balancing Approach in Cloud Computing using Improvised Genetic Algorithm: A Soft Computing Approach
The concept of Cloud computing has significantly changed the field of parallel and distributed computing systems. The major issues to the cloud are resource discovery, fault tolerance, load balancing, safety measure, task scheduling, dependability, data backup, and data portability. Load balancing is one of the essential responsibilities of the cloud computing. In current situation, the load ba...
متن کاملLoad Balancing Approaches for Web Servers: A Survey of Recent Trends
Numerous works has been done for load balancing of web servers in grid environment. Reason behinds popularity of grid environment is to allow accessing distributed resources which are located at remote locations. For effective utilization, load must be balanced among all resources. Importance of load balancing is discussed by distinguishing the system between without load balancing and with loa...
متن کامل